Scale-free network growth by ranking 
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Network growth is currently explained through mechanisms that rely on node prestige measures, 
such as degree or fitness. In many real networks those who create and connect nodes do not know 
the prestige values of existing nodes, but only their ranking by prestige. We propose a criterion 
of network growth that explicitly relies on the ranking of the nodes according to any prestige 
measure, be it topological or not. The resulting network has a scale-free degree distribution when 
the probability to link a target node is any power law function of its rank, even when one has 
only partial information of node ranks. Our criterion may explain the frequency and robustness of 
scale-free degree distributions in real networks, as illustrated by the special case of the Web graph. 
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Scholars have become interested in the many complex 
networks with long-tailed degree distribution |l|,|2|,l3j due 
to their peculiar structural features like resilience |4| and 
to the critical dynamical processes taking place on these 
networks, including epidemics spreading ||, search 
and opinion formation The most popular explana- 
tion for the origin of scale- free networks is preferential at- 
tachment y| , according to which a newly created node is 
connected to a pre-existing one with a probability exactly 
proportional to the number of links (degree) of the target 
node. This mechanism embodies the intuitive idea of a 
'rich get richer' dynamics. In the limit of infinitely many 
nodes, the degree distribution of the resulting network 
has a power law tail with exponent 7 = 3. Preferential 
attachment has been explicitly or implicitly embodied in 
many successive models of network growth 0, . 

Krapivsky and Redner 11] showed that the propor- 
tionality between linking probability and target node de- 
gree is a necessary ingredient of preferential attachment; 
if the linking probability is a power of the degree with 
exponent a, the resulting network has a power law de- 
gree distribution only when a = 1. For a < 1 the de- 
gree distribution is a power law multiplied by a stretched 
exponential and for larger values of a the model yields 
star-like networks. This seems at odds with the abun- 
dance and robustness of scale- free degree distributions in 
real networks. Other proposed mechanisms do not rely 
on preferential attachment. If the attraction of links de- 
pends on some "fitness" property of the target node, the 
networks display scale-free degree distributions for some 
suitable choices of the fitness distribution 0, 0] . 

Whether the link attractiveness of a node depends on a 
prestige measure exogenous or endogenous to the network 
topology, this information may not be available in real 
cases. In a social network, for example, we could assume 
that the probability for a person to make new friends is 
proportional to properties such as popularity, attractive- 
ness, or wealth — all typically difficult for strangers to 
quantify, measure, or discover. 



While the absolute importance of an object is often 
unknown, it is quite common to have a clear idea about 
the relative values of two objects. One can often say 
who is the richer or more popular between two individ- 
uals. As another example, search engine users only see 
how Web pages are ranked. The perception of how items 
are ranked requires far less information than their ac- 
tual importance. Here we propose a model of network 
growth that focuses on the relative rather than absolute 
importance of the nodes, which are ranked according to 
an arbitrary prestige measure. We show that scale-free 
networks emerge for a very general form of the linking 
probability and are stable for a large range of the param- 
eters describing the growth. The result holds even if new 
nodes have information on only subsets of older nodes. 

First, a prestige ranking criterion is selected. At the 
(t + l)-th iteration, the new node t + 1 is created and 
new links are set from it to m pre-existing nodes. The 
previous t nodes are ranked according to prestige, and 
the linking probability p(t +1 — > j) that node t + 1 be 
connected to node j only depends on the rank Rj of j: 



p(t + 1 -» j) 



k=l H k 



(1) 



where a > is a real-valued parameter. The linking 
probability clearly decreases with increasing rank. 

The choice of prestige measure is arbitrary. We dis- 
cuss both topological measures (age t and degree k) and 
exogenous ones (any node fitness 77). 

If the nodes are sorted by age, from the oldest to the 
newest, the label of each node coincides with its rank, 
i.e. R t = t yt. In this special case, our linking proba- 
bility coincides with that of the so-called static network 
model 14]. We can calculate the number of links that 
the i?-th node will attract since its creation. Suppose 
that the evolution of the network stops when N nodes 
are created. At each iteration a constant number m of 
links are created between the new node and the older 
ones. The expected total number of links that the 
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FIG. 1: Inset: In-degree distributions for networks built ac- 
cording to our model (ranking by degree). The number of 
nodes is N = 10 6 . Data are averaged within logarithmic bins 
of degree and shifted along the j/-axis for illustration purposes. 
The main curve plots the degree distribution exponent 7 as a 
function of a, from simulations with nodes ranked by various 
criteria. Error bars represent standard errors on the best fit 
estimates of 7, while the dashed line is the prediction of Eq.^] 



R-th node has attracted at the end is 



— 



N 

t=R+l A? 



mR 



(2) 



The first sum runs over N — R terms because N — R 
nodes are created after node i?, and each of them can be 
connected to R. We now approximate the sums with in- 
tegrals and assume that N 3> R, as we are ultimately 
interested in the thermodynamic limit. We find that 
= AR~ a , where A is a function of N, a and m. 
Knowing it is possible to find how many nodes N). 
have the same expected number of links k. The ra- 
tio Nk/N in the limit of large N yields the probability 
p(k, N) that a node of the network has degree k: 



P (k,N) ~ k-( 1+1 ' a \ 



(3) 



Eq. |3 shows that the degree distribution of the network 
follows a power law with exponent 



7 = 1 + 1/a 



(4) 



for any value of a. Since 7 can take any value greater 
than 1, we can in principle reproduce the exponents mea- 
sured in real systems. For a > 1 (7 < 2) a few nodes 
attract a finite fraction of all links (condensation); in the 
limit case in which the power law of Eq. ^is replaced by 
a simple exponential the network still has a long-tailed 
degree distribution with 7 = 1 (as in the limit a —> 00). 

Let us now consider a more realistic ranking criterion, 
the in-degree. The number of incoming links of a node 



represents how many times the node has been selected by 
its peers. For undirected networks we can equivalently 
use the degree. Nodes with (in-)degree zero, which if 
present are a problem for the extension of other growth 
models to directed networks, do not raise an issue here 
because they have ranks expressed by positive numbers, 
like all other nodes. 

To see what kind of networks emerge with this new 
prestige measure, we cannot apply the above derivation 
because the degree-based ranking of a node can change 
over time. On the other hand, for a growing network 
there is a strong correlation between the age of a node 
and its degree, as older nodes have more chances to re- 
ceive links. Furthermore the ranking of nodes according 
to degree is quite stable 0. Therefore we expect the 
same result as for the ranking by age. 

To verify our expectation we performed Monte Carlo 
simulations of the network growth process with the new 
degree-based ranking strategy. The inset of Fig. E] shows 
the degree distributions of four networks, corresponding 
to various values of the exponent a. In the logarithmic 
scale of the plot the tails appear as straight lines, as one 
would expect for scale-free distributions. To verify that 
the relation between a and the exponent 7 of the degree 
distribution is the one predicted by our model, in the 
main plot of Fig. \I\we compare various pairs (a, 7) with 
the hyperbola of Eq. 21 The agreement is evident. 

A striking feature of our model is that it generates 
scale- free networks even when ranking nodes by a prestige 
measure unrelated to the network topology, i.e. by some 
exogenous fitness attribute of the nodes. Suppose we as- 
sign a fitness rj to the nodes according to an arbitrary 
distribution. Let Rj(t) be the expected rank of node j, 
with fitness rjj, among t nodes. As Rj(t) is asymptot- 
ically proportional to t, we can write Rj(t) = tp(r)j), 
where the relative rank p(rjj) depends only on rjj. By 
replacing the rank in the linking probability of Eq. ^ by 
the expression tp(rjj), we can factor out t~ a from both 
the numerator and the denominator, leading us back to 
the case of static ranking discussed above. 

Monte Carlo simulations confirm the result. We used 
uniform, exponential and power law fitness distributions. 
The resulting networks have degree distributions with 
power law tails, for any value of a and any fitness dis- 
tribution. The relation between the exponents 7 and a 
is in agreement with the prediction of Eq. 01 in all cases. 
In Fig. H we illustrate this relation for the uniform and 
exponential cases. 

Most models of network growth assume that a new 
node can be linked to any existing node, chosen accord- 
ing to some criterion. This requires that the new node 
be aware of the status of all its peers. Such an assump- 
tion of complete knowledge of the network may not be 
realistic. For instance, in a large social network nobody 
knows everybody else. It is reasonable to suppose that 
the knowledge that each node has of the network is lim- 
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ited to a sample of nodes, which is in general different 
from node to node. In the previous example, each person 
has his/her own group of acquaintances. Having access 
only to portions of a network can profoundly affect the 
dynamics of network growth. Mossa et al. showed 
that preferential attachment would yield power law de- 
gree distributions ending with exponential cutoffs. 

Let us check whether and to which extent the hypoth- 
esis of limited information affects the dynamics of the 
rank-driven model proposed here. We assume that when- 
ever a new node is created, it can 'see' each of the preex- 
isting nodes with a probability h. We shall see that if h 
is constant, our earlier result holds; if h is power-law dis- 
tributed, the scenario is more complex, but one recovers 
long-tailed degree distributions in most cases. 

If h is constant the size distribution of the subsets ac- 
cessed by the new nodes is a binomial peaked at ht, where 
t is the number of nodes of the network after t iterations. 
This means that most subsets will have a size of about 
ht. Let us assume that the nodes are ranked according 
to their age, making a formal analysis possible. The link- 
ing probability is still given by Eq. ^ with the important 
caveat that now we deal only with the nodes within the 
subset accessed by the newly created node. So the ranks 
of Eq. prefer to the ordering of the nodes of the subset, 
not of all nodes like before. Let us indicate the 'local' 
ranks with r, to distinguish them from the global ranks 
R we have dealt with so far. 

When a new node t + 1 is created, it knows a list 
of n older nodes. One can calculate the probability 
p(R, r, t, n, h) that node R has local rank r within this 
list, and from this the probability p(t+l — > R,r,n,h) 
that such a node be linked to t+1. Then we sum over the 
possible ranks r of node R in the list (r € 1 . . . n) and all 
possible subset sizes (n S The result yields the 

linking probability p(t + 1 — >■ R, h) of t + 1 to R: 



mated by the following function: 



p(t+l -> R, h) 

h n (l - h)- "r 



En 
m =i m ~ 



EE- 



R- 1 
r - 1 



t - R 
n — r 



(5) 



From Eq. [3] we see that if h = 1, which corresponds to 
a list with all t nodes, one recovers Eq. ^ as expected. 
For h < 1, however, it is not possible to derive a close 
expression for p(t + 1 — > R,h), so we performed Monte 
Carlo simulations of the process leading to Eq. In ev- 
ery simulation we produced a large number of lists, each 
formed by sampling nodes with probability h. At the be- 
ginning of the simulation we initialized all entries of the 
array p(t + 1 — * R, h) = 0. Once a list was completed, we 
added to the entries of p(t + 1 — > R, h), corresponding to 
the nodes of the list, the linking probability as given by 
Eq.^fwith the proper normalization). With this method 
we simulated systems with up to N = 10 6 nodes. 

For t not too small, p(t + 1 — » R, h) is well approxi- 



p(t + l -> R,h) 



(a-l)h a 
ah"- 1 -t 1 - c 



if 1 < R < [ft 
if [v]<R< t. 



(6) 



The first [ft nodes have the same probability of being se- 
lected. For the other nodes, p(t + 1 — ► R, h) has the same 
dependence on R as in the case in which there is complete 
information on the network (h = 1). This means that in 
a network grown with the linking probability of Eq.[SJ the 
first [ft] nodes will have approximately the same number 
of links, whereas the degrees of the others will have the 
distribution of Eq.|3J If h is independent of N, the subset 
of equiprobable nodes does not grow with N and has no 
structural relevance when N — > oo; if h ~ 1/N the sub- 
set of equiprobable nodes is a fraction of the network and 
the degree distribution has an exponential cutoff. Monte 
Carlo simulations confirmed the result. We conclude that 
the degree distribution of networks grown with our rank- 
ing strategy is the same whether new nodes have access 
to the full network or just to subsets of it, as long as 
the subsets contain a constant proportion of the network 
nodes. The latter assumption may not be realistic, as 
the number of contacts may vary appreciably from node 
to node. Next we extend our analysis to this case. 

In general, if h is distributed according to a function 
S(h), we need to convolute the p(t + 1 — ► R, h) of Eq. 
with S(h) to get the linking probability ps(t + 1 ^> R) 
of the full process, 



Ps (t + l^R) 



S(h)p{t+1 -> R,h)dh, (7) 



where h m and h m are the extremes of the interval where 
S(h) is defined. We consider the following simple proba- 
bility distribution for h: 

(3-1 



S(h) 



(8) 



tP- 1 - 1 

where f3 > 0. The function is defined in the range 
h € [1/t, 1], In fact, for a network with t nodes, in order 
to have at least one item in a random selection of nodes 
one needs a probability h > 1/t. The prefactor ensures 
the normalization of the function in the interval [1/t, 1]. 
As we discuss later, the choice of the power law in Eq.[S]is 
a realistic one; it also accounts for the two limit cases of 
uniform ((3 = 0) and exponential (/? — > oo) distributions; 
finally it allows us to treat the problem analytically, pro- 
vided we introduce reasonable approximations. 

We plug Eqs. and H into Eq. □ With R fixed, we 
split the integral over the two h domains corresponding 
to the regimes of Eq. Depending on the values of the 
parameters a and /?, we can neglect different terms in the 
resulting integrands, leading to four cases for the asymp- 
totic dependence of the linking probability on rank. 

In three of the cases ps(t + 1 — * R) has a power law de- 
pendence on R: the networks grown for the correspond- 
ing values of a and (3 will then have scale-free degree 
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FIG. 2: Degree distributions for networks grown according 
to the proposed model, with incomplete information. Inset: 
Regions of the parameter space for the four regime cases. The 
marked points correspond to the (a, (5) values that generate 
the distributions in the main plot. 



distributions. The power law distribution of the proba- 
bility h affects the value of the exponent 7 of the degree 
distribution, which no longer depends only on a and in 
one case is only a function of (3. In the last case the link- 
ing probability is independent of the rank, so all nodes 
have the same chance of being linked and the degree dis- 
tribution is exponential. The four cases correspond to 
regions of the (a, (3) parameter quadrant (Fig. [2] inset). 
In cases a, b and c, p(k) ~ k^ 1 with 7 a = 1 + 1/(2-/3), 
75 = 1 + 1/a, and 7 C = 1 + 1/(1 + a — (3). In case d, p(k) 
is exponential. Monte Carlo simulations confirm these 
predictions, as shown in Fig. [2] The results are identical 
if the nodes are ranked according to degree or fitness. 

Compared to mechanisms proposed in the past to ex- 
plain the emergence of scale-free networks, the rank- 
based model introduced here presents three main ad- 
vantages. First, it assumes less information is available 
to nodes (or node creators); it seems more realistic in 
many real cases to imagine that the relative importance 
of items is easier to access than their absolute impor- 
tance. Second, the link attractiveness of nodes is by no 
means restricted to topology; it can depend on exoge- 
nous attributes of the nodes, which makes our model suit- 
able for applications in many different contexts. Third, 
the criterion is more robust in that: (i) it naturally ex- 
tends to directed networks; (ii) it leads to long-tailed de- 
gree distributions for a broad class of linking probability 
functions — namely power laws of rank with any expo- 
nent a > 0, including the degenerate exponential case 
for a —> 00; and (iii) the scale- free degree distribution 
generated by the model is not affected by limiting the 
information available to subsets of nodes. 

The rank-based model is directly applicable to the Web 
as a special case, if one considers the role of search en- 
gines in the discovery of pages ^3 ■ When a user submits 
a query, the search engine ranks the results by various cri- 
teria including a topological prestige measure, PageRank, 



closely correlated with in-degree. Users do not know the 
PageRank of the search hits, but observe their ranking 
and thus are more likely to discover and link pages that 
are ranked near the top. By analyzing search engine logs 
one finds that (i) users click on result hits with a prob- 
ability that is a power law function of rank matching 
Eq. ^ with a = 1.6; and (ii) user queries return hit sets 
whose size distribution matches the power law in Eq. |H1 
with [3 = 1.1. Assuming that users tend to link pages 
that they discover by searching, and that they are only 
aware of the pages returned by search engines in response 
to their queries, our model predicts a scale-free distribu- 
tion of in-degree with exponent j a = 2.1 (cf. case a and 
curve marked with circles in Fig. yj. This is inperfect 
agreement with established Web measurements |4J. 
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